Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: add rules-keeper #78

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

ashi009
Copy link

@ashi009 ashi009 commented Feb 7, 2023

DO NOT SUBMIT

This CL implements rule-keeper, a GitHub App, to make rule authors' workflow smoother.
For now, it will only collect the data and present them on the ruleset catalog.

As of this CL:

  • collect version data
  • collect version stats data as time series
  • collect project activity data as time series
  • collect project popularity data as time series
  • collect project health data as time series
  • module file data
  • aggregate version stats data as download/week
  • aggregate project activity data as trend
  • aggregate project popularity data as trend
  • render the data

To try this out:

With Github App:

  • create a GitHub App with read-only permission for meta and admin resources
  • download the generated private key
  • install the App to user, repo, or org
  • go run . -app_id=TODO -private_key=TODO

With Personal Access Token:

  • go run . -personal_token=$TOKEN -owner=$OWNER -repo=$REPO

Updates #53

@ashi009
Copy link
Author

ashi009 commented Feb 7, 2023

@kormide PTL

Comment on lines 53 to 54
// releases made at this version.
repeated Release releases = 6;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When would a ruleset version have more than one release?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ha, I thought Github allowed that.

@kormide
Copy link

kormide commented Mar 2, 2023

We should make a new repository on the SIG account so we can start getting this in in smaller chunks. @alexeagle Do we need to have a vote with the SIG members before we create one?

@kormide
Copy link

kormide commented Mar 2, 2023

nit: I might name the proto folder schema since protobuf is just an implementation detail.

@kormide
Copy link

kormide commented Mar 2, 2023

I'm not super familiar with protobuf. For the rules_ts data you pulled, are the csv and METADATA files a serialization of the protobuf data? I always thought that protobuf was transferred (and stored) in some kind of binary format?

Comment on lines 1 to 4
update_time: {
seconds: 1677692269
nanos: 122313000
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For these METADATA files, is the update_time the time that the data was pulled? Is it necessary to store at the resolution of nanoseconds?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not necessary, but it comes directly from google.golang.org/protobuf/types/known/timestamppb. So I'd rather leave it as it is, instead of writing more code to truncate it.

@kormide
Copy link

kormide commented Mar 2, 2023

Overall it's looking good so far. It might be good to start incrementally building out the interface as you build the schema to verify that we have all the information we'll need.

@ashi009
Copy link
Author

ashi009 commented Mar 2, 2023

I'm not super familiar with protobuf. For the rules_ts data you pulled, are the csv and METADATA files a serialization of the protobuf data? I always thought that protobuf was transferred (and stored) in some kind of binary format?

I figured that the CSV is a compact way of storing time series, which is simple and git-friendly (packfiles). In the meanwhile, we can plot them with existing tools, say gnuplot. Which will make our next step easier.

The METADATA is the corresponding metadata to the CSV file, which is defined in protobuf. The sole reason for using protobuf is to make marshal/unmarshal easier. Protobuf's wire format is binary, but it also has a text format, which is human-readable and git-friendly. It's totally possible to store the time series in proto as well, but we will end up with bloated text files.

That's why I ended up with a combination of two.

@ashi009
Copy link
Author

ashi009 commented Mar 2, 2023

nit: I might name the proto folder schema since protobuf is just an implementation detail.

It's kind of a custom to put proto files in a proto directory. It's an implementation detail for sure, but this directory name will leak into import paths and package names, say github.com/.../rulekeeper/proto and com.bazel.contrib.rulekeeper.rulsets.proto. Having proto in it allows developers (even copilot) to know what's inside without reading the generated code.

Once I introduce rules_go to the repo, I'll remove the generated code. It will make the proto part more important as the source won't even exist in the tree.

string description = 4;

// prerelease marks if the release is a preprelease.
bool preprelease = 5;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
bool preprelease = 5;
bool prerelease = 5;

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants